Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
نویسندگان
چکیده
Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, Titan aim evaluate their habitability the existence of potential life on these moons. These will suffer from communication challenges technology limitations. We review investigate applicability data science unsupervised machine learning (ML) techniques isotope ratio mass spectrometry (IRMS) volatile laboratory analogs Europa Enceladus seawaters a case study for development new strategies icy world missions. Our driving goal is determine whether spectra gases could contain information about composition seawater biosignatures. implement ML what inherent pipeline be designed quickly analyze future In this study, we focus exploratory analysis (EDA) step in analytics pipeline. This crucial that allows us understand depth before subsequent steps predictive/supervised learning. EDA identifies characterizes recurring patterns, significant correlation structure, helps which variables are redundant contribute variation lower dimensional space. addition, identify irregularities outliers might due poor quality. compared dimensionality reduction methods Uniform Manifold Approximation Projection (UMAP) Principal Component Analysis (PCA) transforming our high-dimensional space dimension, clustering algorithms identifying data-driven groups (“clusters”) analog IRMS mapping clusters experimental conditions CO 2 concentration. Such characterization efforts first toward longer-term autonomy where similar automated tools used onboard spacecraft prioritize transmissions bandwidth-limited outer Solar System
منابع مشابه
Using Machine Learning for Exploratory Data Analysis
This tutorial will introduce attendees to fundamental concepts in the clustering and dimensionality reduction fields of unsupervised machine learning. Attendees will learn about the assumptions algorithms make and how those assumptions can cause the algorithms to be more or less suited to particular datasets. Hands-on interaction with machine learning algorithms on real and synthetic data are a...
متن کاملExploratory Machine Learning Analysis of Real Network Log Data
Intrusion detection systems often rely on hard checks of incoming requests to identify whether traffic is safe or malicious. Various machine learning approaches have been developed to mine large-scale network logs and help to identify anomalous traffic patterns. In this paper, we apply several machine learning approaches to real data from the MIT network. We describe how these methods could be ...
متن کاملImproved classification of mass spectrometry database search results using newer machine learning approaches.
Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of a...
متن کاملMachine Learning Models for Housing Prices Forecasting using Registration Data
This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...
متن کاملBenchmark of structured machine learning methods for microbial identification from mass-spectrometry data
Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical classification where the proximity between species is generally measured in terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the inform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Astronomy and Space Sciences
سال: 2023
ISSN: ['2296-987X']
DOI: https://doi.org/10.3389/fspas.2023.1134141